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Abstract 

We consider the problem of recovering a target matrix that is a superposition of low-rank and 
sparse components, from a small set of linear measurements. This problem arises in compressed 
sensing of structured high-dimensional signals such as videos and hyperspectral images, as well 
as in the analysis of transformation invariant low-rank recovery. We analyze the performance of 
the natural convex heuristic for solving this problem, under the assumption that measurements 
are chosen uniformly at random. We prove that this heuristic exactly recovers low-rank and 
sparse terms, provided the number of observations exceeds the number of intrinsic degrees of 
freedom of the component signals by a polylogarithmic factor. Our analysis introduces several 
ideas that may be of independent interest for the more general problem of compressed sensing 
and decomposing superpositions of multiple structured signals. 



1 Introduction 

In recent years, there has been tremendous interest in recovering low-dimensional structure in high- 
dimensional signal or data spaces. This interest has been fueled by the discovery that efficient tech- 
niques based on convex programming can accurately recover low-complexity signals such as sparse 
vectors or low-rank matrices from severely compressive, incomplete, or even corrupted observations. 

One representative example arises in Robust Principal Component Analysis (RFC A). There, the 
goal is to recover a low-rank matrix L from grossly corrupted observations. For example, suppose 
we observe M = Lq + So, where Sq is a sparse error. Under mild conditions, the following convex 
program, called Principal Component Pursuit (PCP) |CLMW11[ ICSPWllj : 

minimize \\L\\* + X\\S\\i subject to L + S = M, (1.1) 



precisely recovers Lq and Sq. In (1.1), || ■ ||* is the matrix nuclear norm (sum of singular values) 
and || • ||i is the i 1 norm (sum of magnitudes). For data analysis applications, this suggests that a 
low-rank matrix Lq can be recovered from the observation M despite large-magnitude sparse errors. 
This result has been extended and generalized in a number of directions: to include additional small 
dense noise M = L + Sq + N |ZLW + 10] , large fractions of random errors So jOLW+lOj . and even 
column-sparse or row-sparse errors [XSCUl IMTllj . 
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The conditions under which recovery is known to occur are fairly broad: provided the low-rank 
term satisfies a technical incoherence condition, correct recovery can occur even when rank(Lo) 
almost proportional to dimension of the matrix M , and the number of nonzero entries in So is 
proportional to the number of entries in M [CLMWlT] . On the other hand, in many applications 
of interest, the rank may actually be significantly smaller than dimension (say 3 [WGS + 10], or 9 
BJ03 ). Moreover, cardinality of the sparse term may also be quite small. In such a situation 
our number mn of observations could be extravagantly large compared to the number degrees of 
freedom in the unknowns Lq,Sq. Is it possible to recover Lq and So from smaller sets of linear 
measurements? 



1.1 Compressive RPCA 

The low-rank and sparse model described above captures properties of many signals of interest, 
including foreground and background in video surveillance [CLMWllj . videos [GPX + lll ISA11) . 
structured textures [ZGLMlI] . hyperspectral datacubes IWSBlli IGVllj and more. The ability to 
recover low-rank and sparse models from small sets of linear measurements could be very useful 
for developing new sensing architectures for such signals [DonOG, WSB11 . Mathematically our 
observations have the form 

D = V Q [M] = V Q [L + S ], (1.2) 

where Q C Jj mxn i s a linear subspace, and Vq denotes the projection operator onto that sub- 
space. Can we simultaneously recover the low-rank and sparse components correctly from highly 
compressive measurements via the natural convex program 

minimize + A||S|| a subject to V Q [L + S] = Dl (1.3) 

While this question is largely open, there is good reason to believe the answer may be positive. For 
example, [CLMWlTl ILill] have studied the "robust matrix completion" problem, with Vq = Vn , 
where Q, is a small subset of the entries of the matrix. When Vq = Vq, it is impossible to exactly 
recover So (many of the entries are simply not observed!), but the low-rank term Lq can be recovered 
from near-minimal sets of random samples |Lill) . However, in many applications the sparse term 
S'o is actually the quantity of interest: for example, in visual surveillance, So might capture moving 
foreground objects. To recover both Lq and So, we must require measurements Q that are incoherent 
with both the low-rank and the sparse term. 



In this paper, we investigate the performance of (1.3| when Q is a randomly chosen subspace, 
(incoherent with L a and So with high probability). As the simulation results in Figure [I] suggest, as 
long as the rank and sparsity are low enough, we can expect the convex program to correctly recover 
both the low-rank and sparse components from a reduced set of random linear measurements. A 
similar recovery problem was recently considered by [WSB11 , again, with the goal of designing 
sensing strategies capable of recovering both L and So . We will discuss the results of |WSBllj and 
other related works in more detail in Section [3j after we have stated our main result. 



1.2 Transformed RPCA 

Aside from the perspective of compressive sensing, there are many other practical scenarios that 
require recovering a low-rank matrix from partial, incomplete, or corrupted measurements. One 
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Figure 1: Compressive sensing of low-rank and sparse matrices via convex program (1.3 1. 

We solve the convex program for recovering an to x m matrix M = Lq + Sq with m = 100 from 
q = to 2 — p random linear measurements Vq[M]. For each subplot: the a;- axis is the rank r of the 
matrix Lq and the y-axis is the percentage of non-zero entries in So- The intensity is proportional to 
the probability of success with pure white color meaning 100% (out of 10 random trials). Notice that 
when p = 5, 000, the number of linear measurements is only half of the number of entries and there 
remains a small region where the convex program succeeds. These simulations use an accelerated 
gradient algorithm with continuation, similar to BC GlOj . 



example is when the given data is a transformed version of the low-rank and sparse matrices: 

Mot = L + So, (1.4) 

where r is an unknown nonlinear transformation from some continuous group Q. The goal is to 
simultaneously recover Lq, So and r from M. One can view this as a "transformed RPCA" problem. 
The constraint ( 1.4 1 is often highly nonlinear. One popular approach is to linearize the measurements 
against parameters of the transformation: 

Mot + J[At] fnL + So, 

where J is the Jacobian of M o r against of r. We can then solve for an increment Ar = t^+i — 
in the transformation parameters via the convex program: 

minimize^s, At + A||S||i subject to M o r k + J [At] = L + S. (1.5) 
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Mathematically, this program is equivalent to ( 1.3 ). To see this, let Q be the orthogonal complement 
to the range of J, so that VqJ — 0. Let 



D = V Q [M o t + J[At}} = Vq[Mot] a Vq[L + S }. 

After At is eliminated in this way, the problem now becomes recovering the low-rank and sparse 
components from D: 

minimize^s + A||S||i subject to D — Vq [L + S] , (1.6) 

which has the same form as above. 

Empirically, this iterative linearization scheme performs well in applications such as aligning 
multiple images [GPX + ll] and rectifying low-rank textures [ZGLMlTj . Hence, it is important to 
understand under what conditions we should expect the associated convex program to perform 
correctly. However, there are some important differences from the compressive sensing scenario: 

1. In the transformed RPCA case, we often are dealing with a finite dimensional deformation 
group Q whose dimension, say p, is either fixed (as in [Z GLMllJ) or grows very slowly compared 
to the number of entries in the matrix (as in [GPX + ll] ). 

2. Unlike compressive sensing where the measurement operator Vq(-) can be arbitrarily chosen, 
here it is determined by the given data and the associated transformation group. We can no 
longer model it as a random projection. Hence, we hope to have deterministic conditions which 
can be directly verified with the given data. 



1.3 Compressive Sensing of Decomposable Components 

From both the compressive and transformed RPCA problems, we see the need to understand under 
what conditions we should expect to correctly recover the low-rank and sparse components from 
compressive or partial measurements: D — Vq[Lq + So]. In particular, we are interested in when 
the convex program: 



minimize 



A||5||i subject to D = V Q [L + S], (1.7) 



finds the correct solution Lq and Sq. Following the terminology of [CLMWiT] . in this paper we 
refer to this convex program as Compressive Principal Component Pursuit (CPCP). 



One fundamental question is how many measurements q are needed for the above program (1.7) 
to correctly recover Lq and Sq. Clearly, this number should be bounded from below by the number 
of intrinsic degrees of freedom in (Lq, Sq). Since a rank r matrix has (m + n — r)r degrees of freedom, 
the number of continuous degrees of freedom in the pair (Lq, Sq) is equal to 

(m + n-r)r+ \\S \\ Q , 

where we recall that || • || denotes the number of nonzero entries in a matrix. Hence, the best we 
can possibly hope for is a number of measurements q on this order. We will show that when the 
measurements are random (say Gaussian), the desired (Lq, Sq) can indeed be exactly recovered from 
a number of measurements that is very close to this lower bound: provided 

^measurements > (9(log 2 m) x ^degrees of freedom(ioi ^o)j 
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the compressive principal component pursuit program (1.7) correctly recovers this pair with very 
high probability. Notice that this bound is nearly optimal, differing from the hard lower bound by 
only a polylogarithmic factor. 

Our analysis actually pertains to a much more general class of problems of decomposing a given 
observation into multiple incoherent components: 



minimize A, ll-YjL^ subject to Xi = M. (1-8) 



Here, || • are (decomposable) norms that encourage various types of low-complexity structure. 
Principal Component Pursu it [CTMWIIl IC^PWTT] . Outlier Pursuit jXSUllI IMTTT] and Morpho- 
logical Component Analysis [BSE07 are all special cases of this general problem. Roughly speaking, 
our analysis will suggest that, if the above program succeeds in recovering all the components {Xi} 
from M, one should also expect to recover them from the highly compressive measurements Vq[M]. 
The number of measurements required is again governed by the intrinsic degrees of freedom {Xi} 
multiplying at most a polylog(m) factor. Thus, we believe results in this paper are not only limited 
to decomposing low-rank and sparse signals but also applicable to a broad class of source separation 
or signal decomposition problems that may arise in signal processing, communications, and pattern 
recognition. 

The remainder of this paper is organized as follows. In Section [2j we first introduce the precise 
mathematical model and present the main technical results of this paper. In Section [3j we discuss 
its implications and relationships with existing work in the literature. Section [4] discusses the more 



general setting of ( 1.8 1 and lays out the framework of our analysis. The remaining sections complete 



the proof of our main results. 



2 Models and Main Results 

Our main technical contribution is a procedure for producing a certificate of optimality for (Lq, Sq) 
for the Compressive Principal Component Pursuit problem, given that the pair is optimal for Princi- 
pal Component Pursuit. In this sense, our mathematical approach is modular - it partially decouples 
the analysis of the the compressive measurements from the analysis of the core low-rank and sparse 
recovery problem. Combining with existing models and analyses of PCP, we can prove that the pair 
(Xoj Sq) is indeed recoverable by the convex optimization. 

We first recall conditions under which M = Lo+Sq can be exactly separated into its constituents, 
by PCP. Intuitively, we should not expect to recover all possible low-rank pairs and sparse pairs 
(l/o, So). Indeed, imagine the case when M is rank-one and one-sparse (i.e., M = e^e* for some 
i,j). In this situation the answers (L — e^e*, S = 0) and (L = 0, S = ^e*) both seem reasonable 
- the decomposition problem is ambiguous! 

To make the problem meaningful, we need conditions that ensure that (i) the low-rank term Lq 
does not "look sparse" and (ii) the sparse term Sq does not "look low-rank." One popular way 
formalizing the first intuition of doing this is via the notion of incoherence introduced by [CR08] . If 
the low-rank matrix L has rank-reduced singular value decomposition L = CSV*, then we say 
that Lq is /i-incoherent if 

Vl \\ir ei \\l < ^, Vj \\V*e 3 \\ 2 2 < HOL, and WUV*^ < S^L. (2.1) 
m n V ran 
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Intuitively, these conditions ensure that the singular vectors of Lq are not too concentrated on only a 
few coordinates - the singular vectors do not "look sparse." For further discussion of the implications 
of this condition, we refer the reader to [CR08 . 

At the same time, we need to ensure that the sparse term does not "look low-rank." One 
appealing way of doing this is via a random model: we assume that each is an element of 

supp (So) independently with probability p bounded by some small constant. We assume that 
the signs of the nonzero entries are independent symmetric ±1 random variables (i.e., Rademacher 
random variables). In stating our theorems, we call such a distribution an "iid Bernoulli- Rademacher 
model." 

Thus far, we have discussed only the low-rank and sparse terms, but not the properties of the 
measurements Q. We will give a result for the case when Q is a chosen uniformly at random from the 
set of all g-dimensional subspaces of K mx ™. More precisely, Q is distributed according to the Haar 
measure on the Grassmannian G(M mxn , q). This means that the distribution of Q is rotationally 
invariant. On a more intuitive level, this means that Q is equal in distribution to the linear span of a 
collection of q independent iid AT (0, 1) matrices. In notation more familiar from compressed sensing, 
we may let Q x , . . . , Q q denote such a set of matrices, and define an operator Q : R mxn — > M. q via 

Q[M] = ((Q 1 ,M),...,(Q q ,M))* G R q . (2.2) 
Our analysis also pertains to the equivalent convex program: 

minimize + A HS^ subject to Q[L + S] = Q[L + S ]. (2.3) 



Indeed, since Q has full rank q almost surely, (2.3) and (1.7) are completely equivalent. 

Under this setting, the following theorem gives a tight bound on the number of (random) mea- 
surements required to correctly recover the pair (Lq, Sq) from Vq[M] via CPCP: 

Theorem 2.1 (Compressive PCP Recovery). Let Lq,Sq G E mx ™, with m > n, and suppose 
that Lq ^ is a rank-r, p-incoherent matrix with 

r < (2-4) 
p log m 

and sign (So) is iid Bernoulli- Rademacher with nonzero probability p < c p . Let Q C JJ mx ™ be a 
random subspace of dimension 

dim(Q) > Cq ■ (pmn + mr ) ■ log 2 m (2.5) 

distributed according to the Haar measure, probabilistically independent of sign(So). Then with 
probability at least 1 — Cm~ 9 in (sign(S'o), Q), the solution to 



minimize 



Al+^WSWi subject to V Q [L + S] =Vq[L + S } (2.6) 

with A = 1/y/m is unique, and equal to (Lq,Sq). Above, c t ,c p ,Cq,C are positive numerical con- 
stants. 



Here, the magnitudes of the nonzeros in Sq are arbitrary, and no randomness is assumed in Lq. 
The randomness in our main result is in the sign and support pattern of Sq and the measurements 
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Q. We note in passing that the randomness in the signs of So can be removed using the techniques 
of |CLMWlT] Sections 2.1-2.2. We will not pursue this in great depth here. 

We also note that the bounds on r and p essentially match those of [CLMWlTj . possibly with 
different constants. So, again, r and ||<So||o can be rather large. On the other hand, when these 
quantities are small, the bound on dim(Q) ensures that the number of measurements needed for 
accurate recovery is also commensurately small. We will compare our results to other works from 
the literature in the next section. First, we pause to fix some notation. 

Notation. Bold uppercase letters A, £?,... denote matrices. Bold lowercase letters x,y denote 
vectors. Script uppercase letters A, B, ... denote operators on matrices. In particular, if S C M mx ™ 
is a linear subspace, we will let Vs denote the orthogonal projection onto S. The notations C, c 
will always refer to numerical constants. When used in different sections they may not refer to the 
same constant. All logarithms are base-e. "©" denotes a direct sum between linearly independent 
subspaces. When applied to subsets of a vector space, "+" will denote Minkowski summation, i.e., 
A + B = {a + b\ aeA, b € B}. 

Definition 2.2. We will say that subspaces Si, . . . , Sk are independent if 

dim(Si + • • • + 5fe) = dim(S'i) + • • • + dim(Sfc). 

3 Relationship to the Literature 

As mentioned above, in recent years there has been a large amount of work on matrix recovery and 
decomp osition, for example see jULMWlll ICSPW11I IZLW+lOl iGLW+lOl IXSClll IMT11I IANW111 
HKZlT] and references therein. The aforementioned works all pertain to the case when the matrix M 
is fully observed, and hence are not directly comparable to our result. In Section |4j we will see that 
our analysis gives a tool for transforming a certificate of optimality for the fully observed problem 
into a certificate of optimality for the compressive problem. Because this technique is modular, it 
may be possible to apply it in conjunction with the aforementioned works to prove correct recovery 
under different assumptions, and even with different regularizers. 

Compared to the fully observed problem, there is much less dedicated work on low-rank and 
sparse recovery from compressive measurements. Recently, motivated by applications in compressive 
foreground and background separation and compressive hyperspectral image acquisition, [WSB11 
introduced a greedy algorithm for this problem, which aims at the objective function 

minimizex^s \\D — Vq[L + S}\\ 2 subject to rank (L) < r, \\S\\ < k. (3.1) 

Their algorithm is similar in spirit to the CoSaMP algorithm of NT08] for recovering sparse sig- 
nals, and performs well on numerical examples. Analyzing its behavior theoretically and proving 
performance guarantees is currently an open problem. 

As the body of results on specific problems such as matrix recovery grows, there has been 
an increasing interest in unifying or generalizing the basic insights obtained from studying special 
cases. A number of groups have produced results that pertain to general structured regularizers. For 
example, Negahban et. al. [NRWYIQ] have introduced a general geometric framework for analyzing 
low-complexity signal recovery, highlighting the role of the regularizer in overcoming a lack of strong 
convexity in the loss. Agarwal et. al. [ANWli] use this framework to analyze sparse and low-rank 
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decomposition, and have obtained tight results for estimation in noise, stronger than previously 
known results by |ZLW + 16] . Their analysis proceeds under different (weaker) assumptions, which 
preclude exact recovery. 

In a similar vein, Chandrasekaran et. al. CRPW10 have recently produced a very general analysis 
of structured signal recovery with Gaussian measurements. That work exploits the geometry of the 
atomic norm ball - in particular, relating the required number of measurements to the Gaussian 
width of the tangent cone at the desired solution. Based on this, they give tight bounds on the 
number of measurements needed to recover a low-rank matrix or sparse vector. However, once the 
atomic set contains both low-rank and sparse matrices, it is less clear how to analyze the Gaussian 
width of the tangent cone. Indeed, the non-trivial analysis in [CLMW11, CSP Wllj can be viewed 
as simply showing that the desired solution lies on the boundary of the norm ball. Estimating the 
width of the tangent cone at that point seems to entail additional analytical difficulty. 

For Gaussian measurements, the recent work of Candes and Recht [CRllj also gives simple 
bounds for exact recovery, under the assumption that the regularizer (or norm) is decomposable. If 
we wished to apply similar analysis to our problem, we would need to work with the quotient norm 
on M : 

||M||o= inf pi||. + A||S||i. (3.2) 

L+S=M 

This is the infimal convolution of two decomposable terms. Its subdifferential has a number of 
nice properties which we will exploit in our analysis, but decomposability (in the sense of [CRII ) 
does not appear to be one of them. Nevertheless, the results in this paper show that under suitable 
conditions, we should expect the same type of compressive sensing results for this class of generalized 
norms for superpositions of low-complexity components. 

In this paper, we generalize the analysis of decomposable regularizers to their sums (or strictly 
speaking infimal convolutions) and obtain nearly optimal bounds on the required number of measure- 
ments for exact recovery and decomposition of low-complexity components via convex optimization. 
In particular, our results provide strong theoretical justification for conducting robust principal com- 
ponent analysis with highly compressive measurements. Because our results assume a random model 
for the operator Q, of the two application scenarios described in the introduction, our results are 
likely to be more applicable to the compressive sampling scenario. Indeed - the challenge for ana- 
lyzing transformed matrix recovery problems seems not to lie in elucidating the absolutely minimum 
number of "measurements" , but rather in dealing with dependencies between the operator Q and 
the solution of interest (Lq, So). In a companion paper [GMWM12J, we give results for deterministic 
operators Q, which may be applicable to that situation. 



4 General Certificate Upgrades 



In this section, we present the technical result used to obtain Theorem 2.1 above. As promised, 
this result will have implications for compressive variants of a large number of conceivable signal 
decomposition problems. In full generality, we can imagine that the fully observed data M are given 
as a sum of structured terms: 

M = X 1 +X 2 + --- + X T1 (4.1) 

where each Xi satisfies a low-complexity model such as sparsity or rank-deficiency, possibly also 
including more exotic types of structured sparsity |BaclO| . For each type of structure, we have a cor- 
responding regularizer || • The natural convex heuristic for decomposing M into its components 
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would solve 



minimize 



subject to 



X, =M, 



(4.2) 



where the Xi > are scalar weight factors. Many authors have studied special cases of this problem, 
and given conditions under which correct decomposition occurs. A prime example is Principal 
Component Pursuit; others include Outlier Pursuit [XSC1H IMTllj and Morphological Component 
Analysis |BSE07| . 



The goal of this paper is not to study (4.2 1 per se, but rather to understand what happens to 



it when we only observe compressive measurements of M (or when M itself is subject to some 
transformation) : 



minimize 



\X, 



l(») 



subject to Vq 



= VnM. 



(4.3) 



Suppose we know that (4.2) correctly decomposes M into X%, . . 
can also recover X\, 



,X T . Does this imply that (4.3) 



, X T ? At a slightly more technical level, we can ask whether a certificate 



of optimality for the decomposition problem (4.2) can be refined to also certify optimality for the 



compressive decomposition problem (4.3). Theorem 4.7 below will imply that this is true under 
broad circumstances. Provided we have proved optimality for (4.2), we can move to optimality for 



(4.3), as long as the number of measurements dim(Q) is sufficiently large. In this sense, our analysis 



is modular: any technique can be used to perform the analysis of the original decomposition problem, 
provided it constructs an (approximate) dual certificate. 



Duality and Optimality. Our result pertains to decomposable norms || • [NRWYlOl ICRll] , 

This notion includes many sparsity inducing norms, such as the I norm and nuclear norm (as 
above), as well as sums of block £ p norms. 

Definition 4.1. We say that a norm \\ ■ \\ is decomposable at X if there exists a subspace T and 
a matrix S such that 



d\\ ■ \\(X) = {A\V T A = S, \\V T ±A\\* < 1}, (4.4) 
where \\ ■ \\* denotes the dual norm of \\ ■ \\, and Vt^- is nonexpansive with respect to \\ ■ \\* . 

For example, the i 1 norm satisfies this definition with T = supp (X) and S — sign(_X"). The 
above definition is completely equivalent to that of [CR11 . It is also related to the definition of 
[NRWYlOj . but not strictly equivalent to itQ We assume that each || ■ is decomposable at the 
target solution Xi tin so per the above definition we have a sequence of subspaces and matrices 
Si that define the subdifferentials of each of the regularizers || • ||^. With this notation in mind, we 



can state a simple sufficient optimality condition for (4.3): 



lr To be clear, in the sense of NRWY10 , a norm |j ■ || is decomposable over a subspace pair T, T if for all x 6 T, 
y S T , \\x + y\\ = ||£c|| + ||y||. If x S T, and || ■ || is decomposable over T, T- 1 - in the sense of INRWYlOl . and the 
restriction of || • || to T is differentiable at x, then it is decomposable in the sense of Definition [XT] On the other hand, 
norms that are decomposable in the sense of Definition |4.1| need not be decomposable in the sense of N'HWV )"0] , and 
so the two notions are not strictly comparable. 
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Lemma 4.2. Consider a feasible solution x± = (Xx tin . . . , X T ,*) to (4.3 1. Suppose that each of the 
norms \\ ■ is decomposable at X^*. If Ti, . . . ,T T , Q- 1 are independent subspaces and there exists 
A satisfying V^A — \Si and \\V T ±A\\*^ < Ai for each i, and Vq±A = 0, then x+ is the unique 



optimal solution to (4.3) 



Notice that this condition implies that A lies in the subdifferential of A, 



for each i. The 



proof of Lemma |4.2| follows a familiar form, and is given in Appendix [S] Notice that if we take 
Q = Jj mxn in Lemma 4.2 we obtain a sufficient optimality condition for the original decomposition 



problem (4.2). The condition given by Lemma 4.2 is not so convenient to directly work with, 
because it demands that A exactly satisfies a set of equality constraints V^A = XiSi. One very 
useful device, due to Gross [Grollj . is to trade off between the equality constraints and the dual 
norm inequality constraints H'Pyi A||^-j < \, tightening the latter while loosening the former. The 
following definition gives this idea a name: 

Definition 4.3. We call A an (a, /3) -inexact certificate for a putative solution (-Xi*, . . . ,X T ±) 
to (4.2) with parameters (Ai, . . . , A T ) if for each i, WVtiA — \iSi\\F < ot, and \\V T: i-A\\*^ < Aj/3. 



Comparing to the optimality condition in Lemma |4.2[ we can see that this definition is most 
meaningful when a is small, and (3 < 1. Indeed, a number of simple and powerful analyses of 
problems such as matrix completion and robust low-rank matrix recovery proceed by constructing 
an inexact certificate for which a is polynomial in m , and j3 is a moderate constant, say, 1/2. 



Definition 4.3 pertains to the decomposition problem (4.2 ), and does not involve the measurement 



operator Q in any way. Adding one additional constraint, Vq±A = 0, we obtain an inexact certificate 
for the compressive decomposition problem (4.3): 



Definition 4.4. We call A an (a, (3) -inexact certificate for a putative solution (-X"i,*, 
to (|4.3| with parameters (Ai, . . . , A r ) if 



(i) A is an (a, /3) inexact certificate for (4.2), and 



(n) Vq±A = 0. 

As we will see, an inexact certificate is easier to produce than the "exact" A demanded in the 



optimality condition Lemma 4.2 Is it still sufficient to certify optimality? The following lemma 
shows the answer is yes, provided a and /3 are small enough: 



Lemma 4.5. Consider a feasible solution x± — (X± jir , . . . , -X~ r ,*) to the optimization problem (4.3) 
Suppose that each of the norms \\ ■ \\^ is decomposable at X irk , and that each of the 
the Frobenius norm. Then if T±, . . . ,T T , Q are independent subspaces with 



\m majorizes 



\V Ti VtA\ < 



1 



r - 1 



(4.5) 



and there exists an (a, /3) -inexact certificate A, with 



(1 - \\V q ±V Ti +-+tJ 2 )^/1 - (t - 1) ma*,- \\V Tt V T] \\ min. A, 
then x+ is the unique optimal solution. 



< 1, 



(4.6) 
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We prove this lemma in Section [BJ using a least squares perturbation argument. The additional 

majorizes the Frobenius norm (i.e., for all X, \\X\\({\ > \\X\\p) is 



technical condition that 



immediately satisfied by sparsity inducing norms such as the nuclear and £ norms. In any case, it 
can always be ensured by rescaling. 

Remark 4.6. The denominator in the condition of Lemma \4-5\ depends on our knowledge of the 
relative orientation of the subspaces T±, . . . ,T T and Q. We have stated the lemma in a way that 
assumes bounds on the angles of each pair (Ti,Tj) and between T\ + • • • + T T and Q, but demands 
no additional knowledge. A tighter accounting is possible if more is known about the configuration 
of(T u ...,T T ,Q). 



Thus, to show that X±, . . . , X T solve the compressive decomposition problem (4.3 ), we just have 



produce an inexact certificate A following the specification of Definition 4.4 with (a, 0) sufficiently 
small. This is fortuitous, since many existing analyses of the original decomposition problem (4.2) 
already give certificates for that problem. For example, for Principal Component Pursuit, we can 
leverage existing constructions in |CLMWfT] . To prove that the desired solution remains optimal 



even when we only see a few measurements Q, we will show that a certificate for (4.2) can be 



"upgraded" to a certificate for (4.3), with very high probability in the choice of random Q, and only 



a small loss in the parameters (a, /3). 

Of course, intuitively speaking, this should only be possible if the number of measurements is 
sufficient: if the number of measurements in Q is smaller than the number of degrees of freedom in a;*, 
then reconstruction from the compressive measurements VqM should not be possible. Interestingly, 
however, we will see that the number of measurements does not need to be too much larger than 
the number of degrees of freedom in x±: oversampling by 0(log 2 m) will suffice. We have been a 
bit vague about what we mean by the number of degrees of freedom in the signal. To be precise, 
our theorem will refer to the quantity dim(Xi + • • • + T T ). Indeed, for the i 1 norm, dim(Xi) is the 
number of nonzero entries in the solution Xi. For the nuclear norm, one can check that dim(Ti) is 
the number of degrees of freedom in specifying a matrix whose rank is equal to that of Xi. 

Our main theorem states that with very high probability it is possible to "upgrade" a certificate 
for the decomposition problem ( |4.2[ ) to one for the compressive decomposition problem (4.3), with 
only small loss in parameters (a, j3). As it turns out, the loss in the dual norm |j • || ^ will be bounded 
by the expected dual norm of a standard Gaussian matrix. We will let Vi denote this quantity: 



vt =E 



\G\ 



G~ iid M(0,l). 



(4.7) 



We have the following theorem: 

Theorem 4.7 (Certificate I 

suppose that each of the norms 



Theorem 4.7 (Certificate Upgrade). Consider the general decomposition problem (4.2), and 



majorizes the Frobenius norm. Let x+ 



,X T<it ) be 



feasible for (4.2), and suppose there exists an {a, fi) -inexact certificate for x+ for the decomposition 



problem (4.2) with parameters (Xi). 

Then if Q C R mx ™ is a random subspace distributed according to the Haar measure, with 



dim(Q) > C Q ■ dim(Ti 



T r ) • log 2 m, 



(4.8) 



there exists an [a! ,0) -inexact certificate for x± for the compressive decomposition problem (4.3) 
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with 



< 



< P + Cx max 



v* - 



-y/logm / ||Aj||Togr 
A l I dim(Q) 



1/2 



(4.9) 
(4.10) 



with probability at least 1 — Ci ■T-m 9 in Q. Above, Cq, C\ and Ci are positive numerical constants. 

Remark 4.8. As will become clear in the proof, the degrees m~ 3 and m~ 9 above are arbitrary, and 
can be set to be any constants by appropriate choice of Cq , C\ , C'2 ■ 



Remark 4.9 (Scaling in (4.10)). A casual glance at (4.10) may suggest that we can make j3' 
arbitrarily close to f3, by setting (A^) large. This is actually not the case: the initial certificate A 
must satisfy V^A ~ AjSj. Scaling all of the A,; by the same amount will also scale A, causing no 
effective change to the right hand side of (4.10). 



On the other hand, Theorem \4- 7| suggests an interesting practical role for the expected norms Vi 
in choosing the relative values of Xi. Namely, it suggests setting A, oc i>i. This is consistent (within 
logarithmic factors) with suggestions in fCLMWll], and could suggest a principled way of combining 
many such structure-inducing norms as in (4.3). 



Proof of Theorem \4-7\ Let 



S = Tx 



T T + span(A) 



(4.11) 



where + denotes Minkowski summation. Then S is a linear subspace of dimension at most dim(Ti + 
• • • + T T ) + 1 containing A. Our goal is to generate a certificate A* that is close to A on S, and also 
satisfies 

V Q ±A i , = 0. (4.12) 

Such a A* would inherit the good properties of A on T x + • • • +T T C S, and also satisfy the additional 
equality constraint (4.12) - in effect, certifying that the measurements are sufficient. 



To this end, we will set 



A = 0, 



(4.13) 



We will generate inductively a sequence [■^■j)j=i,...,k f° r appropriate k, such that with high proba- 
bility A* = A/j is the desired certificate. The initial guess A obviously satisfies (4.12), but could 
be very far from A on S. Define the error at step j to be 



Ej = VsiA^-A e S. 



(4.14) 



We will generate a sequence of corrections, each of which lies in Q, that drive Ej toward zero. 
By orthogonal invariance, Q is equal in distribution to the linear span of 



Hx, 



) -f^dim(Q)) 



where Hj are independent iid AT (0, 1/mn) random matrices. Choose from {1, . . . ,dim(Q)}, 



k = [3 log 2 m] 



(4.15) 
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disjoint subsets 1\ 



, Ih of size 



7 = 



dim(Q) 
k 



Our choice of constant ensures that 2 k < m 3 . We will require that 

7 > C 3 -dim(5), 

where C3 is a numerical constant to be specified later. Since by assumption 

dim(Q) > C Q • dim(Ti + • • • + T T ) • log 2 (m), 



once C3 is chosen, we can ensure that Cq is large enough that (4.171 holds. 

,mxn denote t ne semidefinite operator that acts via 



Let Aj : R mxn 



Notice that E [Aj] = ^-2. For j = 1, . . . , k, let 



-AjEj-x 



(4.16) 



(4.17) 



(4.18) 



Then we have 



E mn A IP 



V S [A 3 ]-A 

TTLTi 

VsiAj-^-A-Vs — AjEj 



V s [l 



7 



7 



A, VsEj-L 



(4.19) 



(4.20) 



In paragraph (i) below, we will use this expression to control the Frobenius norm of E k . We may 
further write 



A k = V s [A k ]+V s ±[A k ], 

k 

= A + £ t TP s i —A-PsEj-x, 
U 7 



(4.21) 



where in (4.21) we have used that Ej g S for all j. In paragraphs (ii)-(iii) below, we will use this 
final expression to control the dual norms of A k . 



(i) Driving E to zero. Ensuring that C3 in (4.171 is sufficiently large that the hypotheses of 
Lemma 5.1 arc verified, we have that with probability at least 1 — C4exp(— C17), 



F s [X-—A j ] Fs 



V s — AjVs-Vs 

7 



< 



(4.22) 
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Hence, by (4.20), we have H-EjHp < \ \Ej-\\ F for each j, on the complement of a bad event £ orr of 
probability at most d±k exp(— Cij). On ££ TT) \\Ej\\ F < 2~ J \\Eq\\ f for each j, giving 



||£ fe || F <2- fc ||A|| F and £ \\E 3 \\ F < 2\\A\\ F . 

(ii) Analysis of a'. From the definition, we may set a' = max; HPt^A/. — AjSi||ir. On 

\\P Ti A k - XiSiWp = WVtA^ + E^-KS^f, 

< WPr t A-XiSi\\ F + \\E k \\ F , 

< ||P T ,A-A 4 ^|| F + 2- fe ||Aj| F , 

< ||7> T! A-A 4 S s ;|| f +to- 3 ||A|| f . 



(4.23) 



(4.24) 



Since the first term is bounded by a, the claim in (5.2) is established. 

(iii) Analysis of f3' . Similarly, for f3', we can take 

(3'= max X^WAkW^ 



i=l,.. . ,r 



From (4.21) and the triangle inequality, we have 

< HA 



l A fcll w 



Hi) 



(i) 



„ ran . „ _ 
7 



(4.25) 



(4.26) 



where we used the fact that whenever the primal norm majorizes the Frobenius norm, its dual 

is boi 

/', + ^/Iog777 



minorizes the Frobenius norm. Applying Lemma 5.2 this is bounded by 

k 

||A fc ||* < A. t /3 + 2- fc ||£; l| F + 10 ^' ' ^"" EII^-iIIf' 



< 2 



20- 



V^og 



m 



y 0ll F ' 



< ^ + 21 ^ + ^ ||A||, 



(4.27) 



on the complement of an event of probability at most 2km 10 + fcexp (—5) + P [£ C rr]- I n the 

final line, we have used that 2~ fe < mT 3 and 7 < m 2 . Since 7 > C fogw? ' ^ or some numerical 
constant C5, 



A," 1 IIA.II*) < ^ + C 5 ^^ 



!/j + Vlogm / ||A||| log? 



1/2 



dim(Q) 



(4.28) 



Taking a union bound over i = 1, . . . , r, we have that the desired bounds on a' and /?' hold simul- 
taneously on the complement of an event of probability 



2krm~ w + fcrexp ( ~ ) + C 4 (l + r)fcexp (-C17) . 



(4.29) 
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Since 7 = Q(Cq logm), provided Cq is large enough, the exponential terms will also be bounded by 
to -10 . Using that k = O(logm), we obtain the promised bound on the probability of failure. □ 



5 Key Probabilistic Lemmas 

This section introduces two probabilistic lemmas used in the analysis of the golfing scheme. The first 
lemma shows that Vs^AjVs ~ Vs- Its (routine) proof is delayed to the appendix. The second 
lemma is crucial for controlling the dual norms of Ts± ^r-A.jVs, and is proved in this section. 

Lemma 5.1. There exist numerical constants C\,Ci,c > such that the following holds. Let 
S C Jj mx " he a fixed linear subspace, and let A = Ylj—i Hj(Hj,-), where (Hj) is a sequence of 
independent iidj\f(0, 1/mn) random matrices, and let R = range(.4) C K mxn . Then if 



7 > Ci •dim(S'), 



(5.1) 



with probability at least 1 — C*2 exp (—cy), 



and 



V s — APs-Vs 

7 



VsVrPs ~ —Vs 
mn 



< 



< 



16 mn 



(5.2) 



(5.3) 



The proof of this result follows a familiar covering argument. For completeness, we give this proof 
in Appendix [B] We next state and prove the key probabilistic lemma for analyzing the "upgrade" 



procedure introduced in the proof of Theorem 4.7 This lemma allows us to control the dual norm 
of the constructed certificate. The result is as follows: 



Lemma 5.2. Let S be any fixed subspace of 



(m > n), M any fixed matrix. Let A 



Ym=i Hi(Hi, •) be a random semidefinite operator constructed from a sequence of independent iid 
Af(0, 1/mn) matrices Hi, . . . , H^. Let \\ ■ \\ be any norm that majorizes the Frobenius norm, and let 
|| • ||* be its dual norm. Set v = E [ ||G||*], with G iid Af (0, 1). Then we have 



mn _ 
V AV S M 



< 10\\V S M\\ F 



vTog 



V7 



(5.4) 



with probability at least 1 — m 



-10 



exp I 



Proof. Let r C {N | ||AT|| F < 1} denote the unit ball for |j • ||. Then 



\V S ±AV S M\\ 



= sup (N,V S ±AV S M) 
Ner 



Inner products of this form are particularly easy to control because they involve projections of A 
onto orthogonal subspaces. Since S and S 1 - are orthogonal and Hi is iid Gaussian, VsHi and 
Vg^Hi are probabilistically independent. So, letting H\, . . . ,H denote an independent copy of 
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Hi, ■ ■ ■ , H~f, we have 



P 3 ±APs[-] = V s ±Y,H l (H l ,V s [-}) 

i 

= ^s^HijCPsHi,-) 
i 

^ d ^Vs+miVsHl-) = V, 



(5.5) 



where =d denotes equality in distribution. Hence, we have 

\\V s xAV s M\\* = d \\VM\\*. 
We find the second term more convenient to analyze. Conditioned on H' x , . . . , H , 

6v = (N,VM) =J2('PsH' l ,M)(V s xN,H l ) 



(5.6) 



is zero-mean Gaussian. We have ||X>iW||* = sup Ner £n. A quick calculation shows that for any N 
and AT', 



E[(6v-6v') 2 \H' l ,...,H' 1 ] 



^ V {V S H\, MY 

mn * — » 



1 = 1 



< 



\N — N 



mn 



'112 7 



^VsH'^M) 2 = 



i=i 



~ 2 \\N -N' 



mn 



where we have let 



J2(H[,v S My 



1/2 



(5.7) 



(5.8) 



Consider a second zero-mean Gaussian process (Cn)n£F, defined by letting G be an iid Af (0, 1/mn) 
matrix, and setting (jv = (N,G). From the definition of v, 



E 



sup Cat 
iver 



Another calculation shows that 



E [(Cat - Cn' 



\Jmn 



\N-N'f F 



By Slepian's inequality (e.g., |Verllj Lemma 5.33, |LT91| Chapter 3), we have 



E 



sup ^ \H' l ,...,H' 1 

N 



< S-E 



sup Cat 

N 



'mn 



(5.9) 



(5.10) 



(5.11) 
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Moreover, for any fixed values of H' x , . . . , H' h and any TV € T, £jv is a S-Lipschitz function of the iid 
Gaussian sequence (Hi, . . . ,Hi). Hence, the supremum ||Z>iVf||* is also S-Lipschitz. By Lipschitz 
concentration ( |Led01j Proposition 2.18), 



\\VM\\* > E [||DM||* | (Hi)] 
Combining with our previous estimates, we have 

\WM\\* > 5( *± f) 



ts 



(H\) 



(H[) 



< exp 



< exp 



Since this estimate holds for any value of (H\), it holds unconditionally: 



llDMir > 



S(i/ + t) 



< exp 



(5.12) 



(5.13) 



(5.14) 



Moreover, it is easy to notice that S is itself a ||PgiWf llj^-Lipschitz function of the iid Gaussian 
sequence (H\, . . . , H' y ), with 



n 21\l/2 



E[S]<(E [S 2 ]) 



\\V S M\\ 



From Lipschitz concentration, 



and so 



[~>E[~}+s\\V s M\\ F } < exp 
S > 2\\V S M\\ F 



< 



exp 



(5.15) 



(5.16) 



(5.17) 



Combining this estimate with (5.14), setting t = ^/20 log m, and rescaling by — , we obtain the 



result. 



□ 



6 Proof of Lemma 4.5: Upgrade to Exact Certificate 



In this section, we prove Lemma |4.5| which shows how an inexact dual certificate can be upgraded 
to an exact certificate of optimality. The result will be a consequence of the following general lemma 
on systems of equations. 

Lemma 6.1. Let Ti, . . . ,Tk be independent subspaces of R mxn : and Si £ Ti, . . . , Sk € Jfe. Then 
the system of equations 

r Ti x = Si, i = i,...,k, (6.1) 

has a solution X € Ti + • • • + TJ. satisfying 
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Proof. Let vec : R mx ™ — >■ R mn denote the operator that vectorizes a matrix by stacking its columns. 
For each i, let Ui € jjmnxdim(T;) denote a matrix whose columns form an orthonormal basis for 
vec [Ti]. The system of equations is equivalent to 



" u\ - 






x = 


. ui _ 





U\ ■ vec [Si] 



U* k ■ vec [S k ] 



(6.3) 



where x — vec[X]. Let U* denote the matrix on the left hand side, and s denote the vector 
on the right hand side. Then \\s\\2 = Ej=i II^Hf- The system of equations has a solution x 6 
range(C7) = vec [Ti + ■ ■ ■ + Tk] with £ 2 norm at most ||s||2/cr m i n (l7), and hence (6.1 1 has a solution 
whose Frobenius norm is bounded by the same quantity. Write 



U*U = 



u\u 2 



u*Ui 



u\u k 
u*Uk 



uiui mu 2 



(6.4) 



Let A be any eigenvalue of U*U, with corresponding eigenvector x = (ccj, x* 2 , ... 7 x* k )*. Let 
p = argmaxj | ] 1 1 2 - Then, looking at just the p-th block of the equation Xx = UU*x, we have 

|A-1||M 2 = 



< 



\\J2 U *p U i x i 

i¥=p 

Ei 



u;u,\ 



< \\xpW2 X (k- l)max||J7*C7,-| 



(6.5) 



Since \\U*U \\ = \\V Ti Vt 3 \\ , we conclude that \ min (U*U) > 1- (k- 1) max i7 y [\V Ti V Tj [\, and hence 
o~min(U ) is at least as large as the square root of this quantity. This establishes the result. □ 

I . a 

(6.6) 



Proof of Lemma \4.5\ The assumption implies that T\,...,T T are independent subspaces, and so 
the system of equations 

V T .A = XiS i -V Ti A, t = l,...,T 



is feasible, and has a solution Aq e Ti 

IIAoIIf < 

Moreover, since Ti + 



T T of Frobenius norm at most 



< 



1 - (r - 1) max j#j || T T ,V T] \ 



l-{T-l)maxijtj[\P Ti V Tj [ 

+ T T and Q 1 - are independent, the system of equations 
P Ti +...+t t A = A , V Q ±A = 



(6.7) 



(6.8) 



is feasible (indeed, undcrdctcrmincd). We consider a solution A* of minimum Frobenius norm. 
Under the stated hypotheses, this solution is given by the Neumann series 



A* = 'Pq / ,(PT 1 +-+T T VPT 1 +-+T r )'Ao, 



i=Q 



(6.9) 
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whose norm is bounded as 



|A*||, < 1 ,, " °"% P - (6-10) 
1 - II^Vi+.-.+ivTV II 



Set A = A + A*, and observe that by construction, for each i, Vx ( A = \iSi. For each i, we have 

WP T ±M\h ^ WPr^Mlh + \\V TiJ -A4fc (6.11) 

Because || • \\^ majorizes the Frobenius norm, its dual minorizes the Frobenius norm, and so we 
have 

\7i\\V Tt± A\\* {i) < A-^I^AH^ + X^\\V T ^A4 F . (6.12) 

Under the stated hypotheses, this quantity is strictly smaller than one, and so A satisfies the 
conditions of Lemma 14.21 □ 



7 Proof of Theorem 2.1 : Compressive PCP Recovery 



In this section, we prove Theorem |2.1[ using the general upgrade provided by Theorem |4.7[ In the 
language of Section HI we have || • ||m = || • ||*, || • ||( 2 ) = || • ||i- Both of these norms majorize the 
Frobenius norm. ForPCP, we take Ai = 1, A2 = \ j\fra. 

Let Lq,So denote the target pair, and r the rank of Lq. If we let Lo = USV* denote the 
rank- reduced singular value decomposition of Lq, and T denote the subspace 

T = {UX* + YV* I X e M" xr , Y e R mxr } , (7.1) 

then the subdifferential of the nuclear norm at Lq is 

• \U(L ) = {A I V T A = UV*, \\P T xA\\ < 1} . (7.2) 

It is easy to check that V T ± : M (I — UU*)M(I — W*) is nonexpansive with respect to the 
operator norm || • ||, and so || • ||* indeed satisfies our criteria for a decomposable norm. Similarly, if 
we let ft = supp(So) and £ = sign (So), then 

0|| • ||i(So) = {A I T n A - S, H^AlU < 1} . (7.3) 

Again, Vu c does not increase the l°° norm, and || • ||i is also decomposable. In the language of 



Theorem 4.7 we have T x =T, S x = UV* , T 2 = tt, S 2 = S. For the PCP problem, an (a,/3)- 



inexact certificate is therefore a matrix Apcp satisfying 

|| V T Apcp - UV* || F < a, 

HPnApcp - XS\\f < a, 

||7VA PC p|| < Pi 

||PneA PC p||oo < PX. 



Such a certificate was constructed in |CLMWfT] {^] under the hypotheses of Theorem 2.1 More 
precisely, we have the following: 



2 In the notation of |CLMWll| . the certificate constructed there is A PC p = UV* + W L + W s 



19 



Pro position 7.1 (Dual Certification for PCP [CLM Wllp . Under the conditions of Theorem 
2.1, on an event of probability at least 1 — Cm" 10 the following hold: 



ft) \\V n V T \\ < 1/2, (7.4) 

and (ii) there exists a (to -2 , l/4)-inexact PCP certificate Apcp for {Lq, Sq), which satisfies 

||Apcp||f < 4 v /rank(i ) + (4/3)A v /||^o||o. (7.5) 

Above, C is numerical. 



The careful reader may notice that the relaxation parameters (a, (3) in Proposition 7.1 arc stricter 
than those provided by ICLMWllj . which gives a = l/4y / m, (3 = 1/2. In fact, by modifying the 
constants in the construction of [CLMWllJ, we can achieve /3 smaller than any desired constant, 
and a smaller than any polynomial in m _1 , at the expense of slightly more stringent (but qualita- 



tively equivalent) demands on (Lq, Sq). The bound (7.5| is implied by the probabilistic lemmas in 
CLMW11 , but requires a bit of manipulation to obtain. Below, we will first prove Theorem 2.1 
and then sketch a proof of the modifications to |CLMWlT] needed to obtain the supporting result 
Proposition |7.1| 



Proof of Theorem 2.1. From Lemma 4.5 to show that (Lq, Sq) is the unique optimal solution 



to the compressive PCP problem, it is enough to show that 
(I) \\V T Vn\\ < 1/2. 

(II) There exists an (a' , l/2)-inexact CPCP certificate Acpcp with a' < - — ~~ '4^7=' 



We accomplish this in three parts. In paragraph (i) below, we apply Lemma 5.1 to lower bound 
1 — I^Q-L^TeoH 2 - In paragraph (ii), we use Proposition |7.l| to show (I) and the existence of an 
inexact PCP certificate Apcp. In paragraph (iii) we use Theorem 4.7 to upgrade this to an inexact 
CPCP certificate Acpcp that satisfies property (II). Paragraph (iv) completes the proof by showing 
that the probability of failure is appropriately small. 



(i) Bounding 1 — \\VQ±VT&n\\ • We will apply Lemma 5.1 with S = T + O. The lemma requires 



dim(Q) > Ci -dim^ + n). 

The dimension of T + f2 is a random variable, which depends on the size of the support set f2. Let 
£n denote the event 

£o. = {|Q| < 2pmn + m}. (7.6) 
Notice that |f2| is a sum of mn Ber(p) random variables. By Bernstein's inequality, 

nm> Pm n + t] <ex P ( /m ^ 73 ). (7.7) 



Setting t = pmn + Tn and simplifying, we obtain 

Yn I — 

10 



[£&]<exp[~). (7.8) 
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On £q, we have 



dim(0 + T) < Ipwvn + m + 2mr < 3 • ( pmn + mr ' 



(7.9) 



Comparing (7.9) to the condition on dim(Q) in Theorem 2.1 we can see that on £q, the conditions 
of Lemma 5.1 are satisfied. Now, let S = T + SI, set B = {X E S | ||-X"||f = 1} an d notice that 



l-\\V QJ .Vs\ 



inf {X ) X)-{VqxV s X,V q s.V s X) 
M b (X,(Vs-VsV Q ±V s )X) 



> 



> 



inf ( X 

XeB \ 

dim(Q) _ 

mn 
dim(Q) 



sup (x, [VsVqVs 

XeB \ V 

VsV Q Vs - *^7> s 



dim(Q), 



V s X 



(7.10) 



Let £ Q be the event {\\V S V Q V S - ^^VsW < Te^r}- Using Lemma |Il] and dim(S) - 
dim(T © O) > to, we have 

P[£q I Sa] > l-C 2 exp(-cim). (7.11) 



On f Q , 



15 dim(Q) 



16 mn 

Since by assumption dim(Q) > Cq x log 2 to x dim(T + Q) > Cq x log 2 to x m and to > n, ensuring 
that Cq > 16/15, we can further conclude that 



1-|PVW > 



(7.12) 



(ii) Inexact PCP Certificate. By Theorem 7.1 on an event £pcp of probability at least 1 — 
C2TO -10 , we have HTVPnll < 1/2, and there exists an (to -2 , l/4)-inexact PCP certificate Apcp for 

(L , S ), with 

||Apcp||f < C 3 Vrank(i ) + 2A^j. (7.13) 

Moreover, £pcp is independent of Q. We rewrite the bound ( |7.13[ ) a bit for later use. We have 

m||A PC p||| < TO(2C 3 2 r + 4A 2 |0|) , 

which on £q gives 



m||Apcp|| F < C±(pmn + mr), 



(7.14) 



where C4 is numerical. 
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(iii) Upgrade to CPCP Certificate. Now, condition on £pcp and £q. By our assumption on 
dim(Q), the conditions of Theorem 4.7 are satisfied. On an event ^upgrade of conditional probability 
at least 1 — C^m' 9 , the certificate Apcp can be refined to an (a', /3')-inexact CPCP certificate 
A C pcp, with 

a' < m -2 + TO~ 3 ||A PC p|| i? , (7.15) 



and 



|A PC p|j|logTO 



dim(Q) 



1/2 



:{E[||G||] + Vlogm, E[||G||oo]Vm+ Vmlogm}, 



where G is iid Af (0, 1). Furthermore, provided ran > 1, we have the bounds 



E||G|| < 2 % M and E||G||oo < 3 a/2 log? 



(7.16) 



and so 



< 



C 7 



m \\Ap C p\\ 2 F log m 



dim(Q) 



1/2 



< 



C&(pmn + mr) log 2 m? 1 



dim(Q) 



where we have used (7.14). Ensuring that the constant Cq in the statement of the theorem is larger 
than 16C8, we can conclude that j3' < 1/2. 

Referring to property (II) above, all that is left to show is that a' < - — ^ Vq ^J^ @ "^ F ■ Using 

paragraph (i), on £qC\£q, it suffices to show a' < 4^372- Using (7.14|, ensuring that the constants 
c r , c p in the statement of Theorem 2.1 are sufficiently small (say, each smaller than I/2C4), we may 
conclude that ||Apcp|| < \fm. Hence, we have a' < m~ 2 + m~ 5 ^ 2 , which is strictly smaller than 
4m 3 /2 provided m is sufficiently large. 
We have shown that on 

£good = £ n H £q £pCP n £ upgrade, 

(I)-(II) hold, and hence (Lq, Sq) is the unique optimal solution to the CPCP problem. 



(iv) Probability. We have 

P[£go. od ] < P[(£Qn£o) c ]+P[(Wadcn£ PCP n£o) c ] 

= 1-W[£ Q \£ n ] V[£ n ] + 1 - P [f upgrade I ^pcp n En] P [£ PC p n En } 

< 1 - ¥[£ Q \ £ U ] +¥[£ n ] + 1 - P [Upgrade I £pcp n Si 2 } + V[£ PCP } +P [£&\ 

< Ci cxp(— c\m) + C 2 m~ w + C 5 m~ 9 + 2exp(-3m/10), 

provided that m is larger than some ttiq. Consolidating bounds, we may conclude that correct 
recovery occurs with probability at least 1 — Cgm~ 9 , choosing Cg such that the bound is nontrivial 
only for m > mo. This completes the proof of Theorem 1 2. 1| □ 

We close by sketching the proof of Proposition |7.1[ 
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Proof of Proposition ~7.1\ (sketch). Under the hypotheses, the bound H'Pn'PTl < 1/2 follows 
immediately from Corollary 2.7 of [CLMWllj . [CLMWllj constructs a certificate A in three parts 
as 

(7.17) 
(7.18) 



W s . 



Apcp = UV* + W 

L S I 

The two terms W , W will both be elements of T , so 

I^tApcp - UV*\\ F = 0. 
Moreover, the term W s will satisfy VnW s = Asign(S'o). So, we have 



7.1 



\\V Q A PCP - Asign(S )|| F = \\Va[UV* + W L }\\ F . 

We can therefore take a = \\V n [UV* + W L }\\ F . To prove Proposition 
show that with high probability the following properties are satisfied: 

• (I) Structure constraint: \\Vq[UV* + W L ]\\ F < to -2 . 

• (II) Dual norm constraints: 



(7.19) 

it is therefore enough to 



\V T ±W L \\ < 1/8, 



\Vn4UV* +W L }\\ 00 < 



(7.20) 



and 



\\V T ±W S \\ < 1/8, WV^W 3 ^ < 



(7.21) 



• (III) Frobenius norm bounds: We have ||Apcp||f < 
first term is simply y/r. We will show that 



\W L \\ F < 3y/r, 



\W S \\ F 



< 



-A^/poIo. 



\W L \\ F 



\W h \\ F . The 

(7.22) 
(7.23) 



In paragraph (i) below, we review the construction of W L from [CLMWlT] , and show that with 
slight changes in the constants, (I) and (7.201 are satisfied with high probability. In paragraph (ii) 
we review the construction of W s and show that with high probability (7.21 ) is satisfied. Together, 
this implies that Apcp is an (m -2 , l/4)-certificate for the PCP problem. Paragraph (ii) will also 
show (7.23). Finally, in paragraph (iii), we show (7.22). This step involves the most additional work. 
Taken together, this establishes the proposition. 

(i) Constructing W L . The term W L is constructed to lie in T 1 - and satisfy 

Vn[UV* + W L ] «0. 

This is accomplished via a golfing argument that writes the complement Q c as a union of jo subsets 
Ti, . . . , Tj , with Tj Ber(g)j^] The parameter q is set so that p = (1 — g) J °, which ensures that 
f2 is indeed Ber(,o). Notice that with this setting, we have q > (1 — p)/jo- 



3 In CLMW11 , Tj is denoted flj; we use the notation T to avoid confusion between the support CI and the subsets 
T of the complement of the support. 
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The certificate is generated inductively, starting with Yq — 0. The error at step j is 

Zj = V T Y 3 - UV* 

and the corrective update 

y y, : '/ * :• 

This leads to a recursive expression for the error Zf 

Z, = V T (I - q- lj Pr j )T*rZ j - 1 , (7.24) 
which implies that the error decays quickly in both £°° and Frobenius norm: 

ll^jlloo < ^ 1 1 -2T J — 1 1 1 oo < 2 J ||^o||oo, 

\\ZjWF < |||^-i||f < 2- j \\z \\ F . 

These inequalities are jCLMWllj (3.4)-(3.6), with e = 1/2. Notice that jCLMWllj (3.3) is indeed 
satisfied, provided c r > is sufficiently small. Using 

\\Zo\U < \f^, and \\Z Q \\ F =\\UV*\\ F = ^, (7.25) 
V mn 

we obtain 

30 I jo 

5^ pjiu < 2 v£^ and Y,w z ^ F ^ ( 7 - 26 ) 

3=0 V mn 3=0 

From arguments of [CLMWlTj , these bounds hold simultaneously on an event £z of probability at 
least 1 — Cim~ w . After j steps, the component W L is generated as 

jo 

W L = P T ±Y j0 = P T ±J2<l~ 1 ' p r j Z j _i. (7.27) 

j=i 

The proof of Lemma 2.8(b) of jCLMWllj shows that 

\\Vn[UV* +W L ]\\ F < \\Z jo \\ F . (7.28) 

In [CLMWlT] . j was chosen to ensure that ||Zj ||f < 1/4-y/m. Here, we set j = |~31og 2 m], 
ensuring that ||Z Jo ||f < 2~ jo y / r < vnT 2 . The arguments of [CLMWll] establish the following: 

\\V T ,W L \\ < 2C J Tn] ^ Tn \\UV*\U (7-29) 

II^^IU < \\Z jo \\ 00 +2q- 1 \\UV*\\ 00 . (7.30) 

where C' a is numerical. Moreover, we know that q > lo ° m , where c > is numerical. Hence, we 
have 



\V T ±W L \\ < C\\ r ° , (7.31) 

(7.32) 



1 / I fir 2 fir log 2 m 
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Recalling that by assumption r < c T nj\i log m, we have the desired bounds (7.20), provided the 
constant c r is sufficiently small. 

(ii) Constructing W s . The term W s is constructed to satisfy W s € T^, V n W s = Asign(S )- 
More precisely, ICLMWlT] set this term to be the minimum Frobenius norm solution to the system 
of equations 

V T W s = 0, V n W s = \sign(S Q ). (7.33) 
This solution is given by the Neumann series 

oo 

W s = V T ±^2(VnVTVny[Xsign(S Q )]. (7.34) 

3=0 

So, 

\\W S \\ F < \ X ^^f < |Av^X- (7-35) 

Similar to paragraph (i) above, one can quickly check that by ensuring p is smaller than some fixed 
constant and using the same arguments as |CLMWfTj . (7.21) is satisfied with high probability. 

(iii) Bounding ||M^ L ||i?. We use the fact that Tj and Zj_i are independent random variables. 
By (7.27), it is enough to control the Frobenius norm q~ 1 Vr j Zj-i for each j. Notice that 

Wq-'v^z^l = ||z i _ 1 ||! + 5>- 1 *w-:L)[s J --i& = 11^111 + ^^, 

ki ki 

where Ski is an indicator for the event (k,l) <G Tj. Then E[i?fc;] = 0, \Hki\ < g^HZj-iH^-, almost 
surely, and E[i?^] < q~ 1 [Zj-{\\ l . Summing, we have 



Y.n.Hl] < q^WZ^WUZj-iWl- (7-36) 



kl 

By Bernstein's inequality, 



P[|k-V T(Zi _,||^ > + < exp^- 2g _ 1||Zj i||U||Zj iE 

By setting 



tj = C 2 max{||Z J _ 1 ||^ og - 1 logm, ||Z i _ 1 || 00 ||Z i _i|| F% /g 3 i logm} , (7.37) 
with appropriate numerical constant C2, we can ensure that for each j, 

PO-^Zj-iIIf > < m- 11 . (7.38) 
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Since we have q > lQ g m for some positive numerical constant c, using \/s + 1 < ^/s + y/i, on an 
event with overall probability at least 1 — jam , 



jo 

JO 

< ^HZj-^Ijp + Ca||Z , i _i||oo log™ + tfW||Z i _i|| 00 ||Z i _ 1 || F logi 



< 2^ + 2C 3 log(m) J^ L + Ci v/fog^ x V 2^|| Z 1| ^ 2 1| Z 1| ^ 2 
>' ran * — ' 



< 2 VF + 2C 3 J - ^ l0g2 m + 2C 4 {/ " ^ l0g2 m . 

y m n y m n 

Recalling again the assumption r < c r n//x log 2 m, and ensuring that c r is sufficiently small, the final 
two terms above are bounded by constants. In particular, we can conclude that ||T^ l ||f < 3y/r. 
This completes the proof. □ 
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A Proof of Lemma |4.2fr Optimality Conditions 



Proof. Let / denote the objective function. Consider a feasible perturbation 6 = (Ai, . . . , A T ), so 
Vq J2i A i = 0- Tlien for any W\, . . . , W T such that Mi, W t £ d\\ ■ we have 



(A.l) 



By duality of norms, for each i there exists Hi € 

(H t , V T ±Ai) - 



l mxn with < 1 and 



1(0- 



(A.2) 
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Set Wi = Si +V T ±Hi. From our definition of a decomposable norm, V T ± is nonexpansive, and so 
\\V T ±HiW w < 1, and Wi £ d|| • ||(i)(Xj,*). Moreover, 

(Wi,Ai) = (V Ti Wi,Ai) + (V T xWi,Ai) 

= {V Ti W u V T , Ai) + {V T xWi, V T ±Ai) 

= (S l ,V Tl A l ) + (H t ,V T ±A l ) 

= (St, V T Ai) + \\PT±*i\\{i) (A.3) 



Plugging in to ( A.l[ ), we have 

/(«* + *) > /(** 

= /(x* 

= /(a* 

= /(** 

> /(** 

= /(x* 

= /(x* 



where we have used that (VqA, ■ Aj) = (A, Pq J2j Aj) = 0: since 6 is feasible. Since each of the 
\\V T ± AH^s is strictly smaller than Aj, if any of the V T ± Aj are nonzero, then f(x++S) > f(x+). If, on 
the other hand, all of the V T ±Ai are zero, then Aj £ Ti for all i, and the constraint Vq J2 i Ai = 

implies that J2i Aj e (^i + "' + ^V) H Q x - If J^i A, 0, this contradicts independence of 
(Ti, . . . , T T , Q- 1 ). If X)i Aj = 0, this contradicts independence of Ti, . . . ,T T (which follows from 
independence of . . . , T T , Q' 1 ))- So, we conclude that for any feasible perturbation 6, f(x* + d) 
is strictly larger than /(a;*). □ 



5^<A i S i ,7'r 4 A i ) + A < ||P Ti LA < || (i) 
5^<P T< A,7'T 1 A i ) + Ai||P T j.A i || w 
^(A,P Ti A,) +A i ||P Ii LAi|| (i) 

i 

J] (A, A,) - (A, P T J. A j ) + Aj 1 1 P r x A j 1 1 (j) 
A'E^^ + E-II^AII^IlP^Aill^+AillPyxAill^ 

*V A >£ A i) +J2 ^ - ||P r x A|| fo) ||P T xA 4 || (l) 
i » 

^ - ||P r xA||^) ||P r xAi|| w , (A.4) 



B Proof of Lemma |5.1^ Operator Approximations 



Proof. Fix an 1/4- net T for the unit ball restricted to 5. By [LcdOl Proposition 4.16, there exists 



such a net of size at most exp(dim(5) log 12). Let H 



ixn via Hx = J^Li HiX U and let 



ip : W 1 — > E mx ™ via ipx — Yll=i Hi x h where (Hi) is an orthonormal sequence of matrices that 
span R. By the Bartlett decomposition, we may assume that [vec [Hi] \ • • • | vec [-H7]] £ R mrlX7 
is distributed according to the Haar measure on the Stiefel manifold of mn x 7 matrices with 
orthonormal columns. Moreover, we have A = HH* and Vr = ijjip*. 
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A standard argument (see |Verllj Lemma 5.4) gives that 



ran „ 
7>S ATs-Vs 

7 



sup 
|X|| F = 1 



< 



2 sup 
xer 



\%*X\\i - 1 



(B.l) 



Notice that .i^WX is distributed as an iid 7V(0, 1/7) random vector. Using Lemma 1 of [LMOO , 



\U*X\ 



> 2 



< 2e 



(B.2) 



Choose t = C17, with ci small enough that 4-y/cT + 4ci < 1/2. Take a union bound over all 
exp(dim(5) log 12) elements of T to get 



V s — AVs-Vs 

7 



> 



< 2exp(-ci7 + dim(S')logl2) . 



(B.3) 



Using the assumption that 7 > Cidim(S'), and ensuring that C\ is large enough that c\ > 
completes the proof of (5.2 1. 

For the second term, we repeat the argument, noting that 



log 12 
Ci 



7 



VsVbVs - V t 



< 2 sup 
xer 



7 



\rx\ 



(B.4) 



Note that HV^IIf = ||(vec o ^)*vec [X] |||. The operator vec o ^ : F -) M" m can be identified 
with an mn x 7 matrix 17, which per the above discussion can be taken to be distributed according 
to the Haar measure. By orthogonal invariance, for any fixed x, U*x is equal in distribution to 
the restriction of uniformly distributed random unit vector r G § m,l_1 to its first 7 coordinates. 
Lemma 2.2 of |DG03j provides convenient tail bounds for the norm of such a coordinate restriction. 
Applying that lemma, we have that for every t > 0, there exists c t > such that 



wx 



1 



> t 



< exp (-c t 7) 



(B.5) 



Set t — 1/32. As above, ensuring that C\ is larger than ° % and taking a union bound shows that 



with the desired probability 
statement of the lemma. 



-VsVrVs - V s 



< 1/16. Rescaling gives the bound quoted in the 

□ 
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